Based on WHO statistics, many individuals are suffering from visual problems, and their number is increasing yearly. One of the most critical needs they have is the ability to navigate safely, which is why researchers are trying to create and improve various navigation systems. This paper provides a navigation concept based on the visual slam and Yolo concepts using monocular cameras. Using the ORB-SLAM algorithm, our concept creates a map from a predefined route that a blind person most uses. Since visually impaired people are curious about their environment and, of course, to guide them properly, obstacle detection has been added to the system. As mentioned earlier, safe navigation is vital for visually impaired people, so our concept has a path-following part. This part consists of three steps: obstacle distance estimation, path deviation detection, and next-step prediction, done by monocular cameras.
translated by 谷歌翻译
To achieve autonomy in a priori unknown real-world scenarios, agents should be able to: i) act from high-dimensional sensory observations (e.g., images), ii) learn from past experience to adapt and improve, and iii) be capable of long horizon planning. Classical planning algorithms (e.g. PRM, RRT) are proficient at handling long-horizon planning. Deep learning based methods in turn can provide the necessary representations to address the others, by modeling statistical contingencies between observations. In this direction, we introduce a general-purpose planning algorithm called PALMER that combines classical sampling-based planning algorithms with learning-based perceptual representations. For training these perceptual representations, we combine Q-learning with contrastive representation learning to create a latent space where the distance between the embeddings of two states captures how easily an optimal policy can traverse between them. For planning with these perceptual representations, we re-purpose classical sampling-based planning algorithms to retrieve previously observed trajectory segments from a replay buffer and restitch them into approximately optimal paths that connect any given pair of start and goal states. This creates a tight feedback loop between representation learning, memory, reinforcement learning, and sampling-based planning. The end result is an experiential framework for long-horizon planning that is significantly more robust and sample efficient compared to existing methods.
translated by 谷歌翻译
我们研究数据所有者/卖方的数据搜索者/买家的数据。假设特定的实用程序指标(例如验证集中的测试准确性)在实践中可能不存在,则通常针对特定任务进行数据估值。在这项工作中,我们专注于任务不足的数据评估,而无需任何验证要求。数据购买者可以访问有限数量的数据(可以公开使用),并从数据销售商那里寻求更多数据示例。我们将问题提出,以估计卖方在买方可用的基线数据方面数据的统计属性差异。我们通过衡量买方数据的多样性和相关性来捕获这些统计差异;我们在不要求原始数据的情况下向卖方估算这些措施。我们通过提出的方法设计查询,以使卖方对买方的原始数据视而不见,并且不知道对查询的响应进行响应,以获得多样性和相关性权衡的期望结果。我们将通过对真实的广泛实验进行展示。拟议估计值的表格和图像数据集捕获了买方卖方数据的多样性和相关性。
translated by 谷歌翻译
我们介绍了$ \ pi $ -test,这是一种用于测试跨多方数据分布的数据之间的统计独立性的隐私保护算法。我们的算法依赖于私人估计数据集之间的距离相关性,这是SZ \'ekely等人中引入的独立性的定量度量。[2007]。我们在差异私有测试的实用性上建立了加法和乘法误差界,我们相信在涉及敏感数据的各种分布式假设测试设置中,我们会发现应用程序。
translated by 谷歌翻译
众所周知,诸如超紧凑型矮人(UCDS)和周围地球簇(GCS)的紧凑型恒星系统是已知的,是已经形成这些星系的合并事件的示踪剂。因此,识别这些系统允许研究星系大规模组装,形成和进化。然而,在使用成像数据的缺乏检测UCDS / GCS的光谱信息中非常不确定。在这里,我们的目标是使用6个过滤器中的Fornax Galaxy集群的多波长成像数据训练机器学习模型,将这些对象与前景恒星和背景星系分开,即在6个过滤器中,即u,g,r,i,j和ks。对象的类是高度不平衡的,这对于许多自动分类技术来说是有问题的。因此,我们使用合成少数民族过度采样来处理培训数据的不平衡。然后,我们比较两个分类器,即本地化的广义矩阵学习矢量量化(LGMLVQ)和随机林(RF)。这两种方法都能够以精度识别UCDS / GCS,并召回> 93%,并提供反映每个特征尺寸%(颜色和角度尺寸)的重要性的相关性。这两种方法都检测角度尺寸作为该分类问题的重要标记。虽然U-I和I-KS的颜色指数是最重要的颜色的天文期望,但我们的分析表明,G-R等颜色更具信息,可能是因为发信噪比更高。除了优异的性能之外,LGMLVQ方法允许通过为每个贡献中所证明的数据提供了对每个单独的类,类的代表性样本以及数据的非线性可视化的可能性来实现进一步的解释性。我们得出结论,采用机器学习技术来识别UCDS / GCS可能导致有前途的结果。
translated by 谷歌翻译
肺癌是最致命的癌症之一,部分诊断和治疗取决于肿瘤的准确描绘。目前是最常见的方法的人以人为本的分割,须遵守观察者间变异性,并且考虑到专家只能提供注释的事实,也是耗时的。最近展示了有前途的结果,自动和半自动肿瘤分割方法。然而,随着不同的研究人员使用各种数据集和性能指标验证了其算法,可靠地评估这些方法仍然是一个开放的挑战。通过2018年IEEE视频和图像处理(VIP)杯竞赛创建的计算机断层摄影扫描(LOTUS)基准测试的肺起源肿瘤分割的目标是提供唯一的数据集和预定义的指标,因此不同的研究人员可以开发和以统一的方式评估他们的方法。 2018年VIP杯始于42个国家的全球参与,以获得竞争数据。在注册阶段,有129名成员组成了来自10个国家的28个团队,其中9个团队将其达到最后阶段,6队成功完成了所有必要的任务。简而言之,竞争期间提出的所有算法都是基于深度学习模型与假阳性降低技术相结合。三种决赛选手开发的方法表明,有希望的肿瘤细分导致导致越来越大的努力应降低假阳性率。本次竞争稿件概述了VIP-Cup挑战,以及所提出的算法和结果。
translated by 谷歌翻译
分布式多智能经纪增强学习(Marl)算法最近引起了兴趣激增,主要是由于深神经网络(DNN)的最新进步。由于利用固定奖励模型来学习基础值函数,传统的基于模型(MB)或无模型(MF)RL算法不可直接适用于MARL问题。虽然涉及单一代理时,基于DNN的解决方案完全良好地表现出,但是这种方法无法完全推广到MARL问题的复杂性。换句话说,尽管最近的基于DNN的DNN用于多种子体环境的方法取得了卓越的性能,但它们仍然容易出现过度,对参数选择的高敏感性,以及样本低效率。本文提出了多代理自适应Kalman时间差(MAK-TD)框架及其继任者表示的基于代表的变体,称为MAK-SR。直观地说,主要目标是利用卡尔曼滤波(KF)的独特特征,如不确定性建模和在线二阶学习。提议的MAK-TD / SR框架考虑了与高维多算法环境相关联的动作空间的连续性,并利用卡尔曼时间差(KTD)来解决参数不确定性。通过利用KTD框架,SR学习过程被建模到过滤问题,其中径向基函数(RBF)估计器用于将连续空间编码为特征向量。另一方面,对于学习本地化奖励功能,我们求助于多种模型自适应估计(MMAE),处理缺乏关于观察噪声协方差和观察映射功能的先前知识。拟议的MAK-TD / SR框架通过多个实验进行评估,该实验通过Openai Gym Marl基准实施。
translated by 谷歌翻译
唇读是从唇部运动识别语音的操作。这是一项艰巨的任务,因为在发音时嘴唇的动作是类似的。在对话期间,景观用于描述唇部运动。本文旨在展示如何通过将视频到字符分为两个阶段,即将视频转换为Viseme,然后使用单独的型号将Viseme转换为角色来使用外部文本数据(用于对角色映射)。与正常序列相比,我们所提出的方法通过4 \%改善了4 \%的序列次序列在BBC-oxford唇读数2(LRS2)数据集上序列唇读模型。
translated by 谷歌翻译
随着数据生成越来越多地在没有连接连接的设备上进行,因此与机器学习(ML)相关的流量将在无线网络中无处不在。许多研究表明,传统的无线协议高效或不可持续以支持ML,这创造了对新的无线通信方法的需求。在这项调查中,我们对最先进的无线方法进行了详尽的审查,这些方法是专门设计用于支持分布式数据集的ML服务的。当前,文献中有两个明确的主题,模拟的无线计算和针对ML优化的数字无线电资源管理。这项调查对这些方法进行了全面的介绍,回顾了最重要的作品,突出了开放问题并讨论了应用程序方案。
translated by 谷歌翻译
Deep neural networks (DNNs) are vulnerable to a class of attacks called "backdoor attacks", which create an association between a backdoor trigger and a target label the attacker is interested in exploiting. A backdoored DNN performs well on clean test images, yet persistently predicts an attacker-defined label for any sample in the presence of the backdoor trigger. Although backdoor attacks have been extensively studied in the image domain, there are very few works that explore such attacks in the video domain, and they tend to conclude that image backdoor attacks are less effective in the video domain. In this work, we revisit the traditional backdoor threat model and incorporate additional video-related aspects to that model. We show that poisoned-label image backdoor attacks could be extended temporally in two ways, statically and dynamically, leading to highly effective attacks in the video domain. In addition, we explore natural video backdoors to highlight the seriousness of this vulnerability in the video domain. And, for the first time, we study multi-modal (audiovisual) backdoor attacks against video action recognition models, where we show that attacking a single modality is enough for achieving a high attack success rate.
translated by 谷歌翻译